Analysis of Coordinating Constructions in a Dependency Treebank

نویسندگان

Vladislav Kubon

Markéta Lopatková

Jirí Mírovský

چکیده

This paper summarizes results of automatic analysis of coordinating constructions and appositions in the Prague Dependency Treebank using a method of analysis by reduction. Experiments are performed on a large subset of the treebank. This subset is obtained as a result of a query providing a set of more than 4,300 suitable sentences and their tree structures containing coordinations and appositions. The automatic procedure is complemented by a manual analysis of reasons why certain sentences (trees) were not fully reduced. This analysis helps to gain a better insight into the phenomena of coordination and apposition and their formal properties. Dependency trees have a long tradition in linguistics, especially in the description of Slavic languages of Central and Eastern Europe. Although the history of linguistics witnessed many heated discussions between the followers of the tradition of constituent trees and the linguists adoring dependency trees (both sides usually unable to persuade the opponents about the advantages of their type of trees), it seems that the dependency notation has recently been recognized as an efficient and transparent data type for the description of syntactic relations in treebanks for a number of languages. One of the dependency treebanks which became quite popular among linguists due to the thoroughness of its annotation is the Prague Dependency Treebank (PDT), see (Bejček et al. 2013), the corpus exploited in this paper. One of the difficulties faced by the dependency notation is the necessity to express within a dependency tree not only dependencies, but also relations which are naturally not of a dependency nature. This is a well-known issue, a member of the Prague Linguistic Circle, Lucien Tesnière (Tesnière 1959) already distinguished between structural relations which we nowadays call as dependency (‘connexion’), and between the coordinating relationships (‘junction’).1 In this paper we study primarily the relationships of coordination and apposition, which pose a great challenge to any dependency formalism. The analysis is performed Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. However, such conception is not accepted without reservations – there are influential approaches which capture coordination as a (type of) a dependency relation, see esp. (Mel’čuk 1988). by means of analysis by reduction (Lopatková, Plátek, and Kuboň 2005; Lopatková, Plátek, and Sgall 2007), a procedure which naturally defines governing and dependent words in a dependency relationship; we enrich the procedure to capture also relations of coordination and apposition. We have applied the analysis by reduction automatically to dependency trees of selected sentences from the Prague Dependency Treebank. The results obtained by this method were then manually analyzed. This approach actually brings two kinds of results – it helps to gain better insight into the problem of coordination and its annotation in the corpus, and, on top of that, it also helps to identify potential annotation inconsistencies in the corpus. The theory we adhere to in our investigations, dependency-based Functional Generative Description (FGD), is described primarily in (Sgall, Hajičová, and Panevová 1986). Analysis by Reduction The original method of analysis by reduction (AR) makes it possible to formulate the relationship between dependency and word order (Lopatková, Plátek, and Kuboň 2005). This approach is beneficial especially for modeling the syntactic structure of languages with a high degree of free word order, where the dependency structure and word order are only loosely related. Let us now describe the ideas behind the method used for sentence analysis. Analysis by reduction is based on a stepwise simplification of an analyzed sentence. It defines possible sequences of reductions (deletions) in the sentence – each step of AR is represented by deleting of at least one word of the input sentence; in specific cases, deleting is accompanied by a shift of a word form to different word order position. Let us stress the basic constraints imposed on the analysis by reduction, namely: (i) the obvious constraint on preserving individual word forms, their morphological characteristics and/or their surface dependency relations, and (ii) the constraint on preserving the correctness (a grammatically correct sentence must remain correct after its simplification). The basic principles of AR can be illustrated by the following Czech sentence (1).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

Hindi CCGbank: A CCG treebank from the Hindi dependency treebank

In this paper, we present an approach for automatically creating a combinatory categorial grammar (CCG) treebank from a dependency treebank for the subject–object–verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. An exhaustiv...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Analysis of Coordinating Constructions in a Dependency Treebank

نویسندگان

چکیده

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Hindi CCGbank: A CCG treebank from the Hindi dependency treebank

Feature Engineering in Persian Dependency Parser

عنوان ژورنال:

اشتراک گذاری